A model is a mathematical relationship that comes with a story. Stokey and Zeckhauser (1978) give a definition: “A model is a simplified representation of some aspect of the real world, sometimes of an object, sometimes of a situation or a process”.
A good model reduces a complex situation to a set of essential mechanisms, or dynamics, that an analyst needs in order to make a good decision.
A bad model mischaracterizes the mechanism of interest, is too simple to capture important dynamics, or is too complicated to be calibrated or understood.
Basu and Andrews (2013)
All models are wrong, but some are useful. – George Box
A good model is suited to a particular problem, and balances parsimony and realism, simplicity and complexity.
Sometimes the corresponding empirical study may be infeasible or unethical to conduct in real life.
For example,
What would happen if every injection drug user had access to naloxone? How many fatal overdoses would be averted? Is this intervention program cost-effective?
What would happen if the government eliminated funding for smoking cessation programs?
Models formalize scientific hypotheses about the mechanism that produces a phenomenon of interest.
When data agree with our model, then we may accumulate evidence that the model is correct, or at least that the data do not falsify the model.
When we observe data that do not agree with the predictions of our model, then this might be evidence that our hypotheses are wrong.
Observing that a model fits data well is not a sufficient condition to imply that the model is correct.
What do we mean “correct”? We mean mechanistic or causal. This goes beyond fitting data well. We mean that a model captures the mechanistic features of the data-generating process that are important for the decisions we want to make.
And, sometimes there are no data! For many of our most pressing public health decision-making challenges, no suitable data exist, and we need to invent a plausible model to evaluate the effects of policy possibilities.
Which of these questions should be addressed by modeling?
Question: How many people inject drugs (e.g. opioids) in my city?
Data: counts of \(m\) individuals’ emergency room visits for overdose, \(X_1,\ldots,X_m\), all positive, for one unit of time (e.g. year). We only see \(X_i\) if person \(i\) had at least one overdose.
Why is this a hard problem?
Let \(N\) be the number of people who inject drugs in the city
Let \(X_1,\ldots,X_N\) be the number of times each has overdosed and been taken to the emergency room.
Let \(M=m\) the number who have had at least one overdose, and we know \(X_1,\ldots,X_m > 0\)
Assume every drug injector has an overdose with constant rate \(\lambda\) per unit time.
Therefore, \[ X_i \sim \text{Poisson}(\lambda) \] independently for each \(i=1,\ldots,N\). The distribution of \(X_i\) for \(X_i>0\) is \[ \Pr(X_i=k) = \frac{\lambda^k e^{-\lambda}}{k! (1-e^{-\lambda})} \] So we can estimate \(\lambda\) from the observable data.
Then, we know that \[ E[M] = \Pr(X_i>0) \times (\text{number at risk}) \] and so \[ E[M] = (1-e^{-\lambda}) N . \]
Rearranging, we have
\[ \hat{N} = \frac{m}{1-e^{-\hat\lambda}} \]
Where was the magic step?
A common distributional assumption for positive and zero \(X_i\)’s,
\[ X_i \sim \text{Poisson}(\lambda) \]
This allowed estimation of \(\lambda\), and implies that
\[ M \sim \text{Binomial}(N,1-e^{-\lambda}) \]
which we can use to estimate \(N\).
Some questions:
If you have taken a statistics class, you have seen statistical approaches to explaining variation. For example, consider the “statistical regression model” \[ y = \alpha + \beta x + \epsilon \] If we regard \(x\) as a treatment and \(y\) as a health outcome for a given patient, then we would like to think of \(\beta\) as the “effect” of the treatment.
This model posits a linear relationship between treatment and outcome. Given a one-unit change in \(x\), we expect the outcome \(y\) to change by an increment of \(\beta\).
I think there is no difference between “statistical” and “mechanistic” models, except for the stories we tell about their structure and coefficients. I think:
This is not a statistics course, so we won’t say much more about balancing parsimony and realism for statistical inference.
Basu, Sanjay, and Jason Andrews. 2013. “Complexity in Mathematical Models of Public Health Policies: A Guide for Consumers of Models.” PLoS Medicine 10 (10). Public Library of Science: e1001540.
Stokey, Edith, and Richard Zeckhauser. 1978. Primer for Policy Analysis. WW Norton.